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SOME  STATISTICAL  PROCEDURES  FOR  THE  JOINT  OIL  ANALYSIS  PROGRAM 
FINAL  REPORT  FOR  PROJECT  ORDER  MME-77-006 

by 
D.  R.  Barr,  H.  J.  Larson  and  T.  Jayachandran 

I.   INTRODUCTION 

The  Joint  Oil  Analysis  Program  is  a  tri-service  standardized 
program  to  monitor  equipment  wear  condition  through  the  use  of  oil 
analysis.   Spectrometric  oil  analysis  is  used  to  determine  the 
type  and  amount  of  wear  metals  in  lubricating  fluid  samples. 
There  are  three  primary  factors  that  can  affect  the  accuracy  and 
effectiveness  of  oil  analysis. 

1.  The  daily  spectrometer  calibration  routine  and 

the  particular  oil  standard  used  in  the  calibration. 

2.  The  electrode  type  used  in  the  analysis. 

3.  The  experience  and  training  of  the  spectrometer 
operator/evaluator . 

This  report  describes  statistical  procedures  developed  under 
a  project  sponsored  by  the  Joint  Oil  Analysis  Program  Technical 
Support  Center,  Pensacola,  Florida  and  funded  by  the  Engineering 
Division,  Kelly  Air  Force  Base,  San  Antonio,  Texas. 


Statistical  procedures  for  acceptance  testing  of  new  batches 
of  calibration  standards  are  described  in  Section  II.   A  three-part 
statistical  procedure  for  certification  of  the  spectrometric 
laboratories  is  presented  in  Section  III.  Section  IV  deals  with 
statistical  acceptance  tests  of  electrodes  from  different  suppliers 

In  all  three  sections  certain  results  of  analyses  of 
experimental  data  supplied  by  the  TSC  are  quoted.   These  data  con- 
sisted of  acceptance  testing  readings  of  prepared  oil  standards 
by  three  laboratories  under  ideal  conditions.   Since  these  ideal 
conditions  are  not  expected  to  occur  in  routine  daily  work,  one 
should  be  careful  not  to  extrapolate  these  results  to  more  general 
situations.   The  numbers  used  in  the  worked  examples  came  from  the 
same  source  and,  again,  may  not  be  typical  of  what  can  be  expected 
in  day-to-day  laboratory  work.   The  authors  would  like  to 
acknowledge  the  kind  and  generous  assistance  of  Mr.  Richard  S.  Lee, 
Senior  Army  Representative  of  the  Joint  Oil  Analysis  Program 
Technical  Support  Center,  Pensacola,  Florida.   Any  errors  of 
reasoning  which  may  remain  are  the  sole  responsibility  of  the 
authors . 


II.   CALIBRATION  STANDARDS 

II. 1.   Introduction 

The  methods  and  criteria  we  suggest  for  acceptance 
testing  of  Calibration  Standards  are  an  adaptation  of  accepted 
statistical  procedures,  to  accommodate  specific  features  of 
JOAP  data.   We  therefore  begin  with  a  discussion  of  some  features 
of  these  data,  based  on  sampling  the  calibration  data  provided 
us  by  the  JOAP-TSC.   Next,  the  problem  of  determining  tolerance 
values  (both  for  accuracy  and  repeatability)  is  discussed, 
with  reference  to  the  Baird  Atomic  acceptance  numbers  and  the 
tolerances  published  by  the  JOAP-TSC.   Finally,  a  test  procedure 
is  suggested  for  determining  acceptability  of  new  reference 
standards . 

II. 2.  Characteristics  of  JOAP  data 

Various  data  sets  of  the  calibration  test  data  provided 
by  JOAP-TSC  were  sampled,  to  provide  estimates  of  variance- 
covariance  matrices  as  well  as  Repeatability  Index  characteristics 
over  elements,  laboratories  and  concentrations.   As  an  example, 
we  show  in  Table  1  estimated  variances  (on  the  main  diagonal) , 
covariances  (above  the  main  diagonal)  and  correlations 
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(below  the  main  diagonal)  for  R-l  at  100  ppm  at  the 
Corpus  Christi  Lab.   Typically,  most  of  the  correlations  are 
positive  and  many  of  the  correlations  are  quite  large.   For 
example,  the  estimated  correlation  between  Pb  and  A£  analyses 
is  .96.   This  means  that,  within  a  single  analysis,  a  Pb  reading 
above  100  was  very  likely  to  be  accompanied  by  an  A£  reading 
also  above  100;  indeed,  the  relationship  between  Pb  in  a  given 
analysis  and  Ail  in  that  same  analysis  was  essentially  linear 
(with  positive  slope) . 

Such  correlations  substantially  complicate  the  computational 
difficulty  of  using  a  reference  testing  procedure  that  simultaneously 
incorporates  data  from  all  elements.   Therefore,  we  recommend  a 
procedure  that  continues  the  present  practice  of  performing 
separate  analyses  with  each  element.   Even  so,  the  correlation 
among  analyses  for  various  elements  (within  a  sample  run)  makes 
precise  evaluation  of  overall  error  rates  of  a  testing  procedure 
difficult,  a  point  we  shall  return  to  below. 

In  order  to  get  an  idea  of  the  consistency  of  the  repeat- 
ability index  over  elements,  labs  and  time,  the  variance  in 
analyses  for  individual  sample  runs  was  estimated  for  a  number 
of  situations.   For  example,  Table  2  shows  estimates  made  from 
data  sets  1  and  5  in  the  data  provided  by  the  JOAP-TSC.   From 
these  analyses,  the  following  conclusions  were  reached: 
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1)  Variances  among  elements  may  differ  significantly. 

2)  There  is  a  weak  but  discernable  pattern  of  variance 
sizes  among  elements  (for  example,  Cu  is  among  the 
lowest  and  Mo  is  among  the  highest) . 

3)  There  seems  to  be  no  consistent  pattern  of  variance 
sizes  among  labs. 

4)  Variance  patterns  among  the  two  standards  within  a 
sample  run  tend  to  be  consistent.   That  is,  high 
variance  in  R-l  Pb  tends  to  go  with  high  variance 
in   R-2  Pb  for  a  given  sample  run. 

5)  High  variance  for  one  element  in  a  sample  run  does 
not  imply  other  elements  in  that  sample  run  are  also 
outside  reasonable  variance  standards. 

The  above  conclusions  pertain  to  the  particular  data  set  on  which 
they  are  based  and  may  not  be  typical  of  day-to-day  routine 
readings . 

Based  on  these  conclusions,  the  following  recommendations 
are  made  concerning  the  test  procedure: 

1)  Do  the  standards  acceptability  test  separately  for 
each  element  (further  supporting  the  present  procedure 
in  this  regard) . 

2)  Since  the  reference  standards  are  prepared  by  the  TSC, 
and  a  spectrometer  is  available  to  the  TSC  at  Pensacola, 
complete  all  reference  standard  acceptance  testing 

at  the  TSC„ 


II. 3.  Tolerance  Specifications 

The  data  provided  by  the  JOAP-TSC  were  used  to  investigate 
how  the  repeatability  index  responded  to  changes  in  concentration 
in  a  particular  element,  and  to  determine  whether  the  response 
characteristics  were  the  same  for  all  elements.   This  is  important 
since  a  statistical  procedure  will  measure  significance  of 
apparent  differences  in  mean  concentration  in  terms  of  underlying 
repeatability  of  analyses.   It  was  found  that,  for  most  elements 
with  concentrations  in  the  range  0-100  ppm,  the  repeatability 
index  increased  as  quadratic  functions  of  initial  concentrations 
(see  Figure  1) .   However,  adequate  fit  for  practical  purposes  is 
obtained  with  a  linear  function  (that  is,  for  practical  purposes, 


one  may  assume   RI  =  mCn  +  b,  where   C-   is  the  initial  concen- 
tration, m   is  the  rate  of  increase  in  RI  with   Cn   and  b   is 
the  intercept) .  As  an  example,  Table  3  shows  estimates  of   b 
and  m   for  both  the  linear  fit  and  quadratic  fit.   These  are 
based  on  R-l  analyses  at  the  Pensacola  Lab  (last  run) ,  at 
concentrations  of  3,  10,  30,  50  and  100  ppm.   Figure  2  shows 
plots  of  the  linear  fit  for  13  elements. 

It  was  found  the  elements  appear  to  have  different 
patterns  of  increase  of  RI  with  Cn .   This  suggests  a  different 
tolerance  criterion  should  be  used  for  RI  for  each  element. 
Adequacy  of  the  linear  and  quadratic  fits  are  indicated  by  the 
estimated  correlation  values   r   in  Table  3.   Values  of  .95  or 
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Figure  1.   Plots  of   /RI  =  MC  + b   with  R-l  at  five  levels  of   C  . 
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Figure  2.   Plots  of 


RI  =  MCn  +  b  with  R-l  at  five  levels  of   C  . 
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more  indicate  satisfactory  fit;  values  in  excess  of  .98  indicate 
quite  close  fit. 

Values  of  RI  computed  for  initial  concentrations  greater 
than  100  ppm,  which  were  diluted  to  100  ppm  for  analysis,  were 
not  significantly  different  from  those  for  undiluted  100  ppm 
initial  concentrations.   Note:   It  was  found  that  several  sample 
runs  for  CQ   greater  than  100  ppm  had  R-l  data  identical  with 
other  sample  runs.   For  example,  data  set  31  has  the  same  R-l 
data  as  data  set  34  (and  30  has  the  same  as  37) .   Thus,  it  appears 
the  R-l  concentrations  for  runs  with  initial  concentrations 
above  100  ppm  (as  shown  on  the  computer  print-outs)  were  in  fact 
accomplished  with  undiluted  R-l  standard  at  100  ppm,  in  conjunction 
with  other  standards  testing.   If  this  is  the  case,  no  difference 
in  RI  due  to  dilution  of  more  concentrated  samples  would  exist, 
of  course,  for  R-l  data. 

Tolerances  are  needed  for  both  accuracy  and  repeatability 
of  sample  runs.   Following  accepted  statistical  principles, 
the  accuracy  tolerances  should  depend  on  the  inherent  repeat- 
ability of  the  analysis  process.   Thus,  with  analysis  procedures 
having  high  variance,  one  could  detect  only  large  differences 
in  the  standards  under  test  (if  one  desired  to  control,  at 
specified  levels,  the  probabilities  of  committing  errors  in 
one's  conclusions).   Theoretically,  in  order  to  test  whether 
two  standards  have  the  same  concentration  of  a  given  element, 
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say  iron,  it  is  necessary  to  compare  the  difference  in  estimated 

levels  in  each  standard,  measured  in  standard  deviation  units, 

with  a  critical  value  taken  from  the  statistical  tables.   For 

purposes  of  illustration,  we  describe  such  a  procedure  in  what 

follows.   If   X, ,  Xn,  ...  ,  X   denote  analyses  of  iron  made  with 

l        z  n 

the  old  standard,  and   Y,  ,  . .  .  ,  Y    denote  analyses  of  iron  in 

in 

the  new  standard  (with  analyses  alternating  between  old  and  new, 
as  is  current  (and  good)  practice) ,  then 


Tl  = 


S 
X-Y 


is  compared  with   t~   ~  1       ,~,    where 
r  2n-2;l-ct/2 


X   is  the  average  of   n   consecutive  analyses  with 

the  old  standard 
Y   is  the  average  of   n   consecutive  analyses  with 

the  new  standard 

■X)2  +  Z(Y.-Y)2]  1/2 
S    =  I  - - is  the  estimated 


Tz(X.-X)2  +  E(Y.-Y)^1   7 

L  -  -  ~ i! 

X-Y   L       n(n-l)       J 

standard  deviation  of   X-Y,  and 


t_   o  . -i  _  /j      is  the  tabulated   (l-a/2)  100th  percentile 

of  the  t-distribution  with  2n-2  degrees  of  freedom, 
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A  test  would  reject  equivalence  of  the  old  and  new  standards 

(and  thus  would  reject  the  new  standard  for  iron  content)  at 

the   a   level  of  significance  if   |t|  >  t-  that  is, 

2n-2;l-a/2  ' 

if   |X-Y|  >  S    -t_ 

^_Y   2n-2;l-a/2 

The  point  of  this  illustration  is  not  the  test  itself;  rather, 

it  is  to  demonstrate  how  a  "tolerance,"  in  this  case   S-t,  for 

testing  accuracy  (X-Y)   is  a  linear  function  of  the  joint 

precision  (repeatability),  S.   If  different  elements  exhibit 

varying  characteristics  of  change  in  repeatability  with  changes 

in  initial  concentration,  then  tolerance  specifications  should 

likewise  vary  over  elements  and  initial  concentrations .   It  is 

interesting  to  examine  the  accuracy  and  repeatability  "acceptance" 

tolerances  listed  in  Tables  4-14  and  4-15  of  T . 0 . 33A6-7-24-1 

(enclosure  2  of  our  data  from  TSC,  hereafter  referred  to  as 

"Baird  Atomic"  acceptance  tolerances)  from  this  point  of  view. 

It  is  easily  verified  that,  within  each  group  of  elements, 

AI    is  a  linear  function  of   RI_ .   For  example,  for  the  group 

{Ni,  Si,  Ai,    Be  Cr},  AI,  =  1.885RIA  +  .233,  with  a  correlation 

very  close  to  1 .   (It  is  also  interesting  to  note  that   RI 

in  Table  4-15  of  T.O.  33A6-7-24-1  is,  within  each  element  group, 

nearly  linear  in  initial  concentration,  consistent  with  our 

finding  that  linear  functions  provide  acceptable  fits  of  the 

apparent  relationships  between  RI  and  CQ.) 
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Comparison  of  the  relationships  estimated  from 
T.O.  33A6-7-24-1  with  the  theoretical  coefficients  from  the 
t-tables  can  provide  some  idea  of  the  error  rate  levels  one 
might  achieve  using  the  Baird  Atomic  Acceptance  tolerances. 
Following  the  (2-sided,  2-sample  t-test)  argument  above,  theoretics 

/2  t 
AI  =  2n-2,l-g/2  RJ   _ 

•n 

For  example,  with   n  =  10   analyses  from  each  standard  and 
a   =  .05  (the  probability  of  rejecting  the  new  standard  iron 
content,  given  it  has  in  fact  the  same  concentration  of  iron  as 
the  old  standard) ,  we  would  have 


AI  =  H    <2-101'  RI  =  .940  RT   . 

/To  A 


From  comparisons  of  Tables  4-14  and  4-15  of  T.O .33A6-7-24-1,  we 
find  approximately  (for  all  groups  of  elements)   AI  z   1.9 (RI)  +  b 
where   b   is  a  "calibration  error  allowance"  of  about  .25  ppm. 
In  order  to  obtain  the  slope  1.9  in  this  relationship  with  the 
t-test  with   n  =  10,  one  would  need  to  take   a  z    .0005.   Based 
on  this  analysis,  it  appears  that  test  procedures  using  the 
tolerances  given  in  Table  4-14  give  quite  conservative  tests; 
we  suggest  somewhat  tighter  tolerances  with  the  procedure 
recommended  below. 
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There  appear  to  be  two  major  goals  in  the  standards 
testing  activity.   In  roughly  descending  order  of  importance  to 
the  TSC,  they  are: 

1)  testing   R,  =  R_   for  each  element, 

2)  assuring  analyses  meet  repeatability  specifications 
for  each  element. 

In  addition  to  the  statistical  considerations,  concerning  setting 
of  tolerances,  discussed  thus  far  (primarily  the  principle  of 
setting  tolerances  in  terms  of  repeatability  attained  by  the 
analysis  process) ,  several  operational  considerations  are  involved 
These  can  be  stated  in  terms  of  the  practical  consequences  of 
committing  "type  I"  and  "type  II"  errors  in  testing  for  each 
of  the  goals  listed  above.   A  type  I  error  occurs  whenever  a 
satisfactory  product  (standard)  is  judged  unsatisfactory  by 
the  test  procedure.   This  usually  occurs  because  data  are 
obtained  (by  chance)  that  do  not  fairly  represent  the  "typical" 
data  produced  by  the  procedure.   A  type  II  error  occurs  when 
a  product  that  is  actually  unacceptable  is  judged  acceptable 
by  the  test  procedure . 

General  features  of  such  procedures  include: 
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1)  Any  screening  or  acceptance  testing  procedure  will 
commit  type  I  and  type  II  errors  from  time  to  time, 
although  the  users  of  the  procedure  may  not  be  aware 
of  their  occurrence, 

2)  as  the  type  I  error  rate,  a,  is  made  smaller,  the 
type  II  error  rate,  3/  increases, 

3)  both   a   and   3   can  be  made  smaller  by  increasing 
sample  size,  n,  and 

4)  usually  the  type  I  error  rate,  a,  together  with  n, 
are  taken  as  the  control  variables;  the  value  of   3 
corresponding  to  a  choice  of   a   and   n   is  thus 
determined . 

From  an  operational  point  of  view,  a   and   n   should  be 
selected  for  each  goal  so  as  to  give  test  procedures  with  error 
rates  that  reflect  the  importance  of  the  goals  and  the  seriousness 
(in  terms  of  cost  or  loss)  of  committing  type  I  and  type  II  errors 
For  example,  for  the  primary  goal  of  testing  R..  =  R.,  consider- 
ations include  the  implications  of  operating  with  a  new  standard 
havinq  concentrations  of  one  or  more  elements  different  from 
those  of  the  previous  standard,  and  the  costs  associated  with 
rejecting  a  new  batch  of  standard,  even  though  it  was  acceptable. 
We  realize  that  assessing  such  costs  and  losses  may  be  impossible 
in  practice,  although  even  rough  estimates  can  be  useful  in 
determining  appropriate  levels  of   a   and  n. 
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For  establishing  tolerance  for  accuracy-related  tests 
(R   =  R  ) /  the  selection  of   a   and  n   constitutes 
the  tolerance.   That  is,  in  place  of  an  absolute  tolerance  (such 
as  "+  3  ppm")  we  specify  tolerances,  relative  to  repeatability  of 
the  Analysis  system,  by  setting  a      and   n.   This  has  the  advantage 
of  relating  tolerances  directly  to  the  operating  characteristics 
of  the  test  procedure,  with  immediate  operational  interpretation. 
It  should  be  noted  that  testing  Accuracy  is  in  reality  testing 
relative  accuracy.   We  are  testing  whether  the  new  standard  gives 
readings  essentially  the  same  as  the  old  standard,  not  whether 
the  new  standard  contains  "3  ppm  of  Cu,"  for  example.   Because  of 
the  role  of  frequent  recalibation  of  the  spectrometers,  the 
impossibility  of  maintaining  absolute  control  of  contaminant  level 
in  ppm  is  not  a  problem.   Assuring  that  the  relative  contents  of 
the  old  and  new  standards  are  essentially  the  same  must  (and  will) 
suffice . 

For  establishing  tolerances  for  testing  precision,  we  also 
follow  the  principles  discussed  above.  T:le   have  noted  that,  in 
absolute  terms,  the  repeatability  observed  in  sample  runs  will 
generally  depend  upon  concentration  levels,  as  well  as  the  elements 
under  test.   Thus  the  repeatability  tolerances  must  vary  with  con- 
centration level  and  element.   If  good  laboratory  procedures  are 
strictly  adhered  to  a  high  value  of  RI  would  indicate  spectrometer 
malfunction,  rather  than  any  defect  in  the  standard  being  tested. 
Thus  our  suggested  procedure  includes  monitoring  the   RI   values, 
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but  if   RI   is  "too  high"  for  some  set  of  analyses,  it  is  the 
operating  procedure  or  the  spectrometer  which  is  suspect,  not  that 
the  standard  being  tested  was  incorrectly  prepared. 

In  the  absence  of  clear  notions  concerning  costs  and 
losses  due  to  commission  of  errors  in  testing  for  the  various 
goals,  we  use  "default  values"  of   a   and  take   n  =  10   in  the 
procedures  we  describe  in  the  following  section.   After  some 
experience  with  these  procedures  has  been  gained,  these  values 
can  be  adjusted  if  necessary  to  give  rejection  rates  which  suit 
the  TSC. 


II. 4.  The  Test  Procedure 

Now  let  us  describe  the  suggested  procedure  for  acceptance 
testing  of  prepared  reference  standards.   We  shall  call  the  pre- 
pared standard  to  be  tested  the  candidate  reference  standard. 
Five  different  concentration  levels  (3,  10,  30,  50  and  100  ppm)  are 
to  be  tested.   As  already  mentioned,  we  recommend  that  the  elements 
be  analyzed  individually,  for  each  concentration,  even  though 
the  spectrometer  readings  for  all  13  (or  20)  elements  are  determine 
simultaneously.   If  a  candidate  reference  standard  fails  the  test 
described  in  some  one  or  more  elements,  at  a  given  concentration 
level,  the  candidate  must  then  be  remixed,  to  bring  the  errant 
element (s)  into  line  (if  possible)  and  then  retested  for  all 
elements,  not  just  the  one  (or  more)  which  originally  failed. 
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Should  the  candidate  fail  a  second  time,  it  must  then  be  discarded, 
or  possibly  remixed  again  for  consideration  as  being  acceptable  at 
some  higher  or  lower  concentration  level. 

It  is  assumed  that  the  spectrometer  has  been  accurately 
standardized  at  0  ppm  and  at  100  ppm,  using  a  previously  accepted 
primary  reference  standard.   n  =  10   burns  are  made  of  the  candi- 
date standard  at  each  specified  concentration  level .   Let 
X,  ,  X2,  ...  ,  X,~   be  the  10  readings  gotten  for  a  specified  element 
and  let   X  be  their  average,  and   RI   the  repeatability  index  for 
these  10,  computed  in  the  usual  way.   As  a  first  step  the   RI 
value  should  be  compared  with  the  appropriate  entry  in  Table  4 . 
(See  the  discussion  at  the  end  of  this  section  regarding  the  origin 
of  Table  4.)   If   RI   exceeds  the  tabled  value,  for  the  specified 
concentration -element  combination,  then  the  procedure  or  the 
spectrometer  itself  would  appear  to  be  faulty.   The  spectrometer 
should  be  re-standardized  and  a  new  set  of  10  burns  run,  carefully 
following  accepted  laboratory  procedures.   If  again   RI ,  for  the 
same  element,  is  too  large  it  would  appear  that  the  spectrometer 
is  out  of  order;  no  further  testing  of  the  candidate  reference 
standards  can  be  accomplished  until  it  is  repaired. 

Granted  the   RI   value  does  not  exceed  the  appropriate 
value  in  Table  4,  a  9  9%  (or  some  other  level  if  more  appropriate) 
confidence  interval  for  the  mean  of  the  population  from  which 
the  10  numbers  were  selected  is  computed  as  follows  (the  values 
in  Table  4  were  computed  from  repeated  runs  made  under  ideal  con- 
ditions.  The  values  presented  for   RI   in  this  table  may  in  some 
cases  be  unrealistically  low  for  daily  use) : 

23 


TABLE  4.   Suggested  Limiting  Values  for   RI . 


Element 

3 

10 

30 

50 

100 

Fe 

.42 

.54 

1.33 

2.27 

5.04 

Ag 

.17 

.49 

1.33 

2.13 

5.31 

M 

.73 

.93 

1.68 

1.85 

4.58 

Cr 

.46 

.60 

1.44 

1.65 

3.42 

Cu 

.25 

.53 

1.52 

1.67 

4.08 

Mg 

.30 

.83 

1.65 

2.71 

5.91 

Na 

.22 

.94 

1.82 

2.05 

4.74 

Ni 

.68 

1.08 

1.74 

2.89 

5.76 

Pb 

.88 

.89 

1.24 

2.71 

4.65 

Si 

.37 

.60 

1.46 

2.00 

3.48 

Sn 

1.07 

1.38 

1.57 

1.75 

4.48 

Ti 

.84 

.94 

1.55 

2.99 

4.60 

Mo 

1.00 

1.00 

1.92 

3.32 

7.53 
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the  99.5 —  quantile  of  the  t-distribution  with  9  degrees  of  freedom 

is   t  qqc  =  3.250.   The  99%  confidence  interval  for  the  population 

mean  then  has  endpoints   X  -  (3  .250)  RI//I0"  and   X  +  ( 3 .  250)  RI//T0", 

where   RI   is  the  repeatability  index.   [The  general  form  for  this 

100(1-y)%  interval  is   X  +  t,   ,„  RI//n   where   t,   ,-   is  the 

-   I-y/2  l-Y/2 

+■  v> 
100(l-y/2) —  quantile  from  the  t-distribution  with   n-1  degrees 

of  freedom  and  n   is  the  sample  size,  in  case  it  is  desired  to 
change  either  the  sample  size  or  the  confidence  coefficient.] 
If  the  desired  true  concentration  of  the  candidate  standard  is 
covered  by  the  confidence  interval,  accept  the  candidate  standard 
as  having  the  correct  concentration  of  the  element  analyzed.   If 
the  confidence  interval  does  not  cover  the  desired  true  concentra- 
tion then  it  may  not  have  the  correct  concentration.   To  verify 
this  conclusion  an  additional  10  burns  of  the  candidate  standard 
should  be  made,  alternating  with  burns  of  the  primary  reference 
standard  of  the  same  nominal  concentration:   candidate-primary- 
candidate-primary,  etc.   Let   Y-,  ,  Y~/  ...  ,  Y,Q   be  the  10  new 
candidate  readings  with  average   Y   and  repeatability  index 
RI   and  let   Z,,  Z_  ,  ...  ,  Z  Q  be  the  10  primary  standard  values 
with  mean   Z   and  repeatability  index   RIZ«   Both   RI    and   RI 
should  be  no  larger  than  the  appropriate  entry  in  Table  4;  follow 
the  instructions  above  about  repeating  the  burns  if  either  of  them 
exceed  the  tabular  value.   If  both  satisfy  this  requirement  compute 
the  joint  repeatability  index  by 

S  =  [j    (RI*  +  Rl2)]1/2  ' 
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This  in  turn  can  be  used  to  compute  a  confidence  interval  for  the 
difference  in  true  concentration  of  the  candidate  and  reference 
standards  as  follows:   The  99.5th  quantile  of  the  t-distribution 
with  18  degrees  of  freedom  is   t  gg5  =  2.878.   The  99%  confidence 
interval  for  the  difference  in  true  concentrations  then  has  end- 
points   Y  -  Z  -  (2.878)S//5   and   Y  -  Z  +  (2  .878)  S//5.   If  this 
interval  contains  zero  accept  the  candidate  standard  and,  if  not, 
reject  the  candidate  reference  standard  and  conclude  its  true 
concentration  is  not  the  desired  level.   It  then  must  be  remixed 
or  discarded  as  described  above. 

NOTE:  It  is  possible  that  statistical  significance  and 
chemical  significance  are  not  identical  and  this  procedure  may- 
prove  too  stringent  (the  criteria  may  be  impossible  to  meet) . 
That  is,  in  chemical  terms  perhaps  a  30  ppm  standard  could  actually 
have  a  true  concentration  anywhere  between  29  and  31  ppm,  say, 
without  causing  any  difficulties .  Thus  a  candidate  standard  should 
be  acceptable  in  this  case  if  its  true  concentration  is  as  low 
as  29  or  as  high  as  31  ppm.  In  the  procedure  just  described,  then, 
the  candidate  standard  should  be  initially  accepted  if  29  or  31 
or  any  value  in  between  is  included  in  the  confidence  interval  for 
its  true  concentration  level.  (In  more  general  terms,  accept  the 
30  ppm  candidate  if  30  +  A  or  30  -  A  or  any  number  in  between 
is  covered  by  the  confidence  interval  where  A  defines  the 
limits  of  chemical  significance.)  If  the  30  ppm  candidate  is 
initially  rejected,  and  10  more  burns  are  alternated  with  the 
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30  ppm  reference  standard,  accept  the  30  ppm  candidate  if  the 
confidence  interval  for  the  mean  difference  in  the  two  concentrations 
includes   -2  or  2  or  any  number  in  between.   (Again  if   30  -  A 
and   30  +  A  define  the  limits  of  chemical  significance,  accept 
the  30  ppm  candidate  if   -2A   or   2A   or  any  number  in  between  is 
covered  by  the  confidence  interval  for  the  difference.)   With  these 
modifications  for  chemical  significance,  the  procedure  described 
should  prove  a  practical  and  useful  way  to  control  the  quality  of 
newly  prepared  standards. 

Origin  of  Table  4 . 

The  numbers  in  Table  4  were  computed  from  data  sets  supplied 
by  the  TSC  as  an  enclosure  to  their  letter  dated  July  28,  1977. 
Data  sets  1  through  9  contain  3  collections  of  10  burns  of  primary 
reference  standard   R-l ,  by  the  Pensacola  laboratory.   RI  was 
computed  for  each  of  these,  for  each  element,  giving  3 RI   values 
for  each  element-concentration  combination.   These  3 RI • s  were 
pooled  within  each  concentration-element  combination,  using  the 
formula 

RI   =  V  I  (RI1  +  RI2  +  RI3}   * 

2  2 

In  theory   RI    is  a  constant  times  a    x  -random  variable  with 

2  7  degrees  of  freedom.   If  we  let   RI*   be  the  repeatability  index 

from  10  burns  of  a  candidate  standard  (some  specified  element  and 

2    2 
concentration)  the  ratio   (RI*)  /RI    has  the  F-distribution  with 
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9  and  27  degrees  of  freedom  and,  with  probability  .99  this  ratio 

should  not  exceed  3.16  or,  equivalently ,  RI*   should  not  exceed 

RI  /3 .16 .   This  latter  value  is  given  in  Table  4.   Three  entries 

P 
in  Table  4,  Si-30,  Sn-30  and  Mo -3,  did  not  seem  reasonable  when 

calculated  from  this  formula,  due  to  what  appeared  to  be  aberrant 

results  in  data  sets  1  through  9.   These  have  been  adjusted  slightl 

from  what  this  formula  would  give.  As  indicated  earlier,  the 

numbers  in  Table  4  may  be  too  conservative  in  some  cases .   In 

such  situations  larger  limiting  values  for   RI   have  to  be  chosen. 

II. 5.   A  Numerical  Example 

Assume  the  10  readings  gotten  for  a  30  ppm  candidate 
standard  are  as  given  in  Table  5.   The  average  values,  X,  and   RI 
values  are  also  listed  there,  as  are  the  lower  and  upper  99% 
confidence  limits  computed  from  the  formula  discussed  above.   Note 
that  none  of  the   RI   values  exceed  the  appropriate  entries  in 
Table  4,  so  the  next  step  is  the  computation  of  the  confidence 
limits  (given  in  Table  5).   The  confidence  limits  for  Fe ,  Ai ,  Ni , 
Pb,  and  Si  do  include  30,  the  nominal  level  tested,  so  these 
elements  appear  to  be  at  the  correct  concentration  level.   None 
of  the  confidence  intervals  for  the  remaining  elements,  however, 
contain  30  so  they  would  all  be  suspect.   Now  let  us  suppose  that 
chemical  common  sense  dictates  the  true  ppm  content  could  be 
anywhere  between  29  and  31  (A  =  1)  and  the  candidate  standard 
would  be  acceptable.   This  would  mean  that  we  want  to  see  if  29 
or  31  or  any  number  in  between  is  included  between  the  confidence 
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limits  for  the  remaining  elements.   With  this  change,  Cr,  Sn ,  Ti 
and  Mo  are  now  acceptable,  but  Ag,  Cu,  Mg  and  Na  are  still  unaccept- 
able.  Thus,  10  more  burns  of  the  candidate,  alternating  with  10 
burns  of  the  30  ppm  primary  reference  standard  are  called  for, 
with  only  the  readings  for  Ag,  Cu,  Mg  and  Na  to  be  analyzed. 

Assume  the  values  in  Table  6  result.   Again  all   RI   values 
are  acceptable  (compared  with  entries  in  Table  4) .   Also  given  in 
Table  6  are  the  values  for 


=VT 


(RI„  +  RI„) 


and  the  upper  and  lower  confidence  limits  for  the  difference  in 
mean  concentration  of  the  candidate  and  reference  standards 
using  the  formula  discussed  above.   Since  each  confidence  interval 
includes  zero  we  would  conclude  that  the  30  ppm  candidate  is 
acceptable  for  all  elements .   (Granted  that  chemical  common  sense 
allows   A  =  1 ,  we  would  still  have  accepted  the  candidate  if  the 
9  9%  confidence  limits  for  Na  were,  say,  -3  and  -1,  since  this 
interval  includes  -2.) 


II. 6.  Summary  of  Calibration  Standards  Testing 

a.  Carefully  standardize  the  spectrometer  using  the  primary 
reference  standard  at  0  ppm  and  100  ppm. 

b.  Following  accepted  laboratory  techniques  make  10  burns 

of  the  candidate  standard  at  each  prepared  concentration: 

3,  10,  30,  50  and  100  ppm. 
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c.   For  each  element  and  concentration  compute  the  average 

1   10 
X  =  yp-     I      X.   and  the  repeatability  index 

xu  j=l   J 
RI  =  v/  ±   E(Xi  -  X)2  . 


d.  Compare  RI  for  each  element  and  concentration  with  the 
appropriate  value  in  Table  4  .   If   RI   exceeds  the  value 
in  Table  4  for  any  element-concentration  combination, 
restandardize  the  spectrometer  and  carefully  repeat  10  burns 
of  the  candidate  at  the  same  concentration  and  again 
compute   RI   for  each  element.   If   any   RI   exceeds  the 
appropriate  value  in  Table  4,  the  spectrometer  should  be 
checked  before  proceeding  further.   After  the  spectrometer 
is  again  in  good  working  order,  start  again  at   a. 

e .  For  each  element-concentration  combination  compute  the 
99%  confidence  limits  for  true  concentration: 

X  -  (3.250)  RI//I0",  X  +  (3.250)  RI//T0~.   Let  CQ   represent 
the  nominal  concentration  level  and   C   +  A    the  limits 
of  chemical  significance.   If   Cn  -  A,  C.  +  i  or  any 
value  in  between  lies  between  the  confidence  limits 
X  +  (3 .250 )  RI//H",  for  each  element-concentration  combina- 
tion, accept  the  candidate  standard.   If  this  is  not  true 
for  some  element-concentration  combinations  go  to   f. 
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f .   For  each  concentration  where   Cn  -  A  and   Cn  +  A 

fall  outside  the  confidence  interval  in  e .  ,  repeat  10  burns 
of  the  candidate,  alternating  with  burns  of  the  primary 
reference  standard  of  the  same  concentration.   The  follow- 
ing computations  are  made  only  for  the  elements,  from  e., 
whose  true  concentration  is  suspect.   Let   Y,  RI   be 
the  average  and  repeatability  index  for  the  candidate  and 
let   Z,  RI    be  the  average  and  repeatability  index  for 

u 

the  primary  reference  standard  at  the  same  concentration, 
same  element.   Compute  the  9  9%  confidence  limits  for  the 
difference  in  true  concentration  level  for  the  two: 

Y  -  Z  -  2.878S//3"  ,      Y  -  Z  +  2.878S//5  , 
where 

S  =\/|  IHj  +  RIZ»   • 

If   -2 A,  2 A   or  any  value  in  between  lies  between  these 
confidence  limits,  that  element  appears  to  have  an  accept- 
able concentration  level.   If  all  element-concentration 
levels,  which  were  suspect  from  e.,  satisfy  this  then  con- 
clude the  candidate  standard  is  acceptable  at  all  con- 
centrations tested.   Any  element-concentration  for  which 
this  is  not  satisfied,  appears  to  have  an  unacceptable 
concentration  level  and  should  be  rejected. 
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TABLE 

6 .   Candidate  and 

Reference  Readings 

Ag 

Cu 

Mg 

Na 

33.3 

32.4 

32.7 

33.9 

32.3 

32.1 

31.4 

30.7 

34.3 

33.3 

34.4 

34.7 

33.3 

32.7 

32.9 

33.3 

34.8 

33.2 

33.6 

34.5 

Candidate 

33.6 

32.7 

32.1 

34.3 

33.3 

32.4 

32.9 

32.7 

32.9 

32.1 

31.9 

34.7 

33.6 

32.4 

33.8 

33.4 

34.4 

32.5 

32.3 

34.3 

Y 

33.58 

32.58 

32.80 

33.6  5 

RIY 

.75 

.41 

.93 

1.23 

33.5 

33.3 

33.5 

33.9 

34.4 

33.3 

33.4 

34.7 

34  .  6 

33.2 

33.6 

34.1 

34.1 

33.2 

32.4 

35.4 

34.9 

33.1 

32.6 

34  .8 

Reference 

32.9 

31.6 

32.9 

32.6 

31.5 

31.1 

30.0 

33.4 

32.5 

31.3 

31.5 

33.8 

33.9 

32.2 

32.3 

35.0 

33.6 

32.6 

32.8 

35.2 

Z 

33.59 

32.49 

32.50 

34  .29 

RIZ 

1.04 

.88 

1.08 

.89 

S 

.91 

.69 

1.01 

1.07 

Lower  CL 

-1.18 

-  .80 

-1.00 

-2.02 

Upper  CL 

1.16 

.98 

1.60 

.74 
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III.   LABORATORY  CERTIFICATION 

III.l.  Introduction 

Paragraph  2  of  the  project  order  MME-77-006  requires 
the  development  of  statistical  methodology  to  evaluate  and 
certify  the  spectrometric  laboratories  participating  in  the 
joint  oil  analysis  program.   The  evaluation  of  a  laboratory  is 
to  be  comprised  of  three  sub-evaluations  viz.,  an  evaluation 
of  the  spectrometer  performance,  a  comparison  of  the  laboratory 
performance  with  that  of  another  laboratory  that  is  considered 
to  have  met  certification  criteria,  and  an  assessment  of  the 
oil  analysis  evaluator's  ability  to  make  correct  decisions  based 
on  the  results  of  the  analyses. 

The  methods  we  present  in  this  paper  are  applicable  for 
evaluating  the  spectrometric  analyses  results  on  a  single  element. 
As  in  the  previous  chapter,  separate  evaluations  for  the  different 
elements  are  recommended  and,  of  course,  the  same  statistical 
methods  are  to  be  used  with  each  element.   The  same  is  also  true 
for  different  initial  concentration  levels  in  the  standard  oil 
samples;  a  separate  statistical  analysis  for  each  initial  con- 
centration level  is  to  be  performed.   The  rest  of  the  discussion, 
therefore  will  apply  to  the  results  of  repeated  independent 
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analyses  (replications)  on  a  sinale  element  with  a  fixed  initial 
concentration  level  in  the  standard  oil  samoles.   However,  a 
laboratorv  should  be  considered  to  have  met  all  certification 
requirements  only  if  it  passes  the  statistical  tests  for  each 
combination  of  element  and  concentration  level . 

The  spectrometer  evaluation  methodology  will  require 
each  laboratory  to  analyze  a  standard  sample  with  a  fixed  initial 
concentration  level,  each  day.   If  the  spectrometer  performance 
is  to  be  examined  at  different  concentration  levels  then  daily 
analyses  must  be  performed  at  each  concentration  level  of 
interest.   At  the  time  a  laboratory  is  due  for  certification, 
the  data  for  the  immediately  preceding  twelve  months  will  be 
used.*  The  inter-laboratory  comparison  does  not  require  any  new 
data  and  all  the  required  information  can  be  extracted  from 
the  monthly  correlation  reports. 

III. 2.  Spectrometer  Certification 

We  propose  a  two-part  procedure  for  determining  if  a 
spectrometer  meets  certification  criteria.   The  first  part  is 
a  macro  test  to  see  if  during  the  preceding  year,  on  the 
average,  the  accuracy  and  repeatability  indices  were  within 
"acceptable  limits."   The  acceptable  limits  we  propose  for  usage 
are  the  maximum  allowable  accuracy  and  repeatability  indices 
as  given  on  page  8-2  of  the  JOAP  Laboratory  Manual  of  1  May  197 
We  recognize  that  these  limits  are  quire  conservative  in  the 
sense  that  they  are  not  the  tightest  bounds  possible.   If  a 


If  the  laboratory  is  new  and  has  been  in  existence  for  less 
tnan  one  year,  a  modified  procedure,  described  at  the  end  of 
this  section,  may  be  used. 
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better  set  of  bounds  can  be  determined,  perhaps  based  on  past 
data,  they  should  be  used  in  the  tests  described  herein.   Part 
two  is  a  micro  test  comprised  of  twelve  separate  analyses  of 
the  monthly  results;  this  test  is  essentially  a  test  for 
consistency. 

Let   X..,  i  =  1,2, ...,12;  j  =  1,2,...   be  the  results 
of  the  spectrometric  analyses  for  a  specified  combination  of 
element  and  concentration  level.   The  subscript   i   ranges 
over  the  twelve  months  and  the  subscript   j   represents  the 
working  days  within  each  month.   Thus,  the  total  number  of   X's 
will  be  equal  to  the  number  of  working  days  for  the  year.   Let 


n.  =  number  of  data  points  for  the  i   month 

•  12 
N  =   y   n.  =  total  number  of  observations 
i=l   x 


12    i 


X 


J    £   X. ./N  =  average  for  the  year 
i=l  j=l   13 


TO     n  • 

2    12    i  2 

S   =  I        I       (X.  .  -  X)  /(N-l)  =  sample  variance  for  the  jjf 
i=l  j=l    XJ 

l_iQ  =  initial  concentration  level 

Afl  =  maximum  allowable  accuracy  level 

Rq  =  AQ/2  =  maximum  allowable  repeatability  level 

a  =  .05  =  significance  level  or  Type  1  error  probability 
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2 

X 


Z  Q5  =  -1.645  =  tabulated  5    percentile  of  the 

standard  normal  distribution 

Z  g75  =  1.96  =  tabulated  97.5th  percentile  of  the 

standard  normal  distribution 

2  i  r  ~|2  , 

05  N-l   =  2"  I"1-645  +  ^2N-3     =  approximate  5    percentile 

of  a  chi-square  distri- 
bution with  N-l  degrees 
of  freedom 

t  q_5  q  =  2.262  =  97.5    percentile  of  the  student's 

t-distribution  with  9  degrees  of  freedom 


We  assume  that  the   X. .'s  are  normally  distributed  with  an  unknown 

2 
mean  value  y  and  an  unknown  variance   a  .   Previous  studies  have 

shown  that,  as  a  general  rule,  the  results  of  spectrometric 

analyses  tend  to  be  normally  distributed. 

a.   Macro  Test.   This  test  consists  of  statistically 
establishing  whether  or  not,  the  true  accuracy  index   I y-y^ I   and 
the  true  repeatability  index  a  are  below  the  maximum  values 
An   and   R_,  respectively.   We  first  compute  a  95%  upper  con- 
fidence  bound  for   a2   as   [(N-l)S  /X>05,n-1]   (that  is, 


[■ 


Q2    <   (N-DS2 

X. 05, N-l  J 


=  .95 


2     2 
Since  it  is  required  that  a      <    R   we  can  conclude,  with  about 

95%  confidence,  that  the  repeatability  index  is  within  acceptable 
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bounds  provided  that 


(N-l)2    R2 


2    N  "0 

X.05 


The  chance  that  this  procedure  will  result  in  a  conclusion  that 
the  repeatability  index  is  unacceptable,  when  in  fact  it  is, 
is  about  5%.   Next,  we  obtain  a  95%  confidence  interval  for 
y   as   X  +  Z  Q-c(S//N)  (that  is, 


-  ".975 


] 


X"  Z.975  -^   <  V    <    X  +  Z_975  — 


The  maximum  acceptable  accuracy  index  is   An ,  which  implies  that 
I y-Un I   must  be  less  than  Afl   or  equivalently   y   must  satisfy 
the  inequality  constraint 

yQ  -  AQ  <  y  <  yQ  +  AQ  (2) 

A  combination  of  (1)  and  (2)  will  provide  the  criterion  for 
acceptability  of  the  accuracy  index  viz.,  conclude  that  the 
accuracy  index  for  the  spectrometer  meets  the  certification 
criterion  if 

|X  -  u0|  <  AQ  -  (1.96)  -§- 


The  probability  of  wrongly  concluding  that  the  accuracy 
index  is  unacceptable  is  about  5%.   If  both  the  accuracy  index 
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and  the  repeatability  index  are  found  to  be  acceptable,  the  macro 
test  has  been  met  and  we  proceed  to  the  next  stage. 

b.   Micro  Test.    This  is  a  procedure  to  check  whether, 
on  a  monthly  basis,  the  spectrometric  analyses  results  are  consistent 
and  that  there  are  no  significant  fluctuations  from  month  to 
month.   We  do  this  by  computing  twelve  9  5%  confidence  intervals 
for  the  unknown  mean  \i ,  based  on  a  sample  of  size  10  observa- 
tions for  each  month.   From  among  the   n.   observations  for  the 
i    month  a  sample  of  size  10  is  selected;  we  suggest  that  every 
second  observation  starting  with  the  second  working  day  of  each 
month  be  selected.   As  long  as  the  spectrometric  laboratories 
are  not  aware  of  the  selection  process  it  should  not  result  in 
any  systematic  bias  creeping  in.   It  may  happen  that  for  certain 
months  (February,  for  example)  the  selection  scheme  will  not 
result  in  ten  samples.   If  this  is  the  case,  additional  samples 
to  make  up  the  difference  should  be  taken  at  random  from  the 


remaining  data  for  the  month.   Let   Y.,,  Y .  ,, ,  ...  ,  Y.  , n 
3  ll    i2       'i,10 

the  ten  measurements  sampled  for  the  i   month  and  let 


10 
Y.  =   J   Y.  . /10   be  the  sample  mean 
1    j=l   ^ 


be 


and 


2    10 
S.  =   7   (Y.  .  -  Y.)/9   the  sample  variance. 

1     jii     ^ 
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The  95%  confidence  interval  for  y,      for  the  i    month  will  then 
be 


S.  S. 

Y.  -  (2.262)  — —  <    \i    <  Y.  +  (2.262)  — —  ,   i  =  1,2,..., 
1  /TO         X  /TO 


As  in  the  case  of  the  macro  test,  we  conclude  that  the  accuracy 
index  for  the  i   month  meets  the  certification  criterion  if 


Y.  -  ul  <  An  -  (2.262)  — i- 
100  /TO 


, 


This  procedure  will  wrongly  conclude  that  the  results  for  a  mont 
do  not  meet  certification  criteria  about  5%  of  the  time.   Now, 
let  us  examine  the  results  of  the  "acceptance  sampling"  scheme 
for  the  twelve  months  in  question.   If  the  spectrometer 
performance  is  consistent  throughout  the  year,  the  number  of 
monthly  acceptance  sampling  tests  that  will  lead  to  a  rejection, 
has  a  binomial  distribution;  the  parameters  of  the  distribution 
are  m  =  12   and  p  =  .05.   An  examination  of  the  binomial 
tables  shows  that  about  9  8%  of  the  time  at  least  10  monthly  test: 
should  result  in  acceptance.   Thus,  the  micro  test  will  conclude 
that  the  spectrometer  does  not  meet  the  certification  criterion 
if  the  number  of  "acceptance  tests"  that  lead  to  acceptance  is 
less  than  10 . 


jo. 
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c.   Examples .   Annual  laboratory  certification  is  a  new 
concept  and  will  not  be  operational  for  a  while.   We  will,  there- 
fore, use  sample  statistics  derived  from  the  validation  data 
on  standard  samples  (furnished  by  JOAP-TSC)  for  purposes  of 
illustration  of  the  methods  described  in  this  paper. 

Macro  Test: 

Element :  Cu 

Initial  concentration   p_  =  100  ppm 
Max  accuracy  limit   Afi  :      10.5 
Max  repeatability  limit   R  :       5.3 

N  =  25 3  =  approximate  number  of  working  days  in  a  year 
X  =  98.5  =  average  of  253  spectrometer  readings 
S  =  3.84  =  sample  standard  deviation  of  253  observations 


x'o5,252  "  y  [-1-645  +  /5uT 


i2 

=  215.96 


(N-l)S2   _  (252)  (3.84)_^_  =  1?  2± 


2  '  215.96 

X.05,N-1 

2 
Since  17.21  is  less  than   Rn  =  28.09   we  conclude  that  the 

repeatability  index  meets  the  macro  certification  criterion 


|X  -  uQ|  =  |98.5  -  100 !  =  1.5 
AQ  -  (1.96)S//N  =  10.5  -  (1.96)  (3.84)//25~3  =  10.03 
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Since   I X  -  y.  I  <  Afl  -  (1.96)S//N   the  accuracy  index  also  meets 
the  macro  certification  criterion. 


Micro  Test: 


The  sample  statistics  and  the  results  of  the  statistical 
analysis  are  presented  in  tabular  form  below: 


Element:   Cu; 

y0 

=  100; 

AQ  =  10.5 

Month 

S. 

Accept 

i 

Y. 

l 

s. 

1 

|Y\  -  yQ|    AQ    (2.262) 

l 

or 

Reject 

/To" 

1 

98.1 

1.37 

1.9 

.  9.52 

Accept 

2 

96.2 

2.57 

3.8 

8.66 

Accept 

3 

100.7 

2.31 

0.7 

8.85 

Accept 

4 

101.4 

2.37 

1.4 

8.80 

Accept 

5 

101.8 

3.19 

1.8 

8.22 

Accept 

6 

99.7 

3.16 

0  .3 

8.24 

Accept 

7 

100.7 

2.95 

0.7 

8.40 

Accept 

8 

99.2 

4.26 

0.8 

7.45 

Accept 

9 

97.0 

2.00 

3.0 

9.07 

Accept 

10 

100.7 

2.41 

0.7 

8.78 

Accept 

11 

98.3 

1.57 

1.7 

9.38 

Accept 

12 

97.6 

2.46 

2.4 

8.74 

Accept 

Since  each  of  the  twelve  monthly  results  is  within  acceptable 
limits  the  conclusion  is  that  the  spectrometer  performance  is 
consistent.   It  is  apparent  that  with  An  =  10.5   a  monthly 
result  will  not  be  rejected  unless  the  monthly  average   Y. 
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differs  from  \iQ      by  a  large  amount;  an  examination  of  the 
validation  data  for  standard  samples  shows  that  large  differences 
occur  very  rarely,  if  at  all.   A  more  sensitive  procedure  would 
result  if  the  maximum  accuracy  deviation  is  modified  to 
A'  =  A_/2  =  10.5/2  =  5.25.   If  this  change  is  adopted  the  results 
of  the  macro  test  will  be  unaffected  since   A'  -  (1 .96) S//N  =  4 . 78 
and   | X  —  y q I   is  less  than  4.78.   For  the  micro  test  the  monthly 
results  for  the  second  month  will  be  unacceptable  since 
|Y2  -  p0  I  =  3.8   is  greater  than   A^  -  (2.262)S  //lO"  =  3.41. 
However,  because  only  one  out  of  the  twelve  monthly  tests  leads 
to  rejection  the  micro  test  would  result  in  the  conclusion  that 
the  spectrometer  is  consistent.   Even  if   An   is  changed  to   A', 
the  maximum  repeatability  index   Rn  =  Ar>/^      must  be  left  unchanged 
since  it  is  already  a  reasonably  tight  bound.   It  should  be 
pointed  out  that  in  order  to  qualify  for  certification  a  labo- 
ratory has  to  pass  each  of  the  statistical  tests  for  all  combi- 
nations of  elements  and  concentration  levels  for  which  data  has 
been  collected.  With  20  elements  and  5  concentration  levels 
the  number  of  combinations  is  100.   If  A'  =  A./2   is  used  in 
place  of   An   itself,  as  the  maximum  accuracy  limit,  this  will 
definitely  increase  the  chance  of  at  least  one  rejection  out 
of  the  100  combinations. 

Some  of  the  newer  laboratories  would  have  been  in 
existence  for  less  than  a  year.   In  these  cases,  full  year's  data 
will  not  be  available  and  the  tests  will  then  have  to  be  modified. 
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As  an  example,  if  data  is  available  for  six  months  or  more 

both  the  macro  and  micro  tests  can  still  be  performed.   For 

2 

the  macro  test  the  parameters   N,  X  nc  N_i   quoted  earlier 

should  be  suitably  modified.   The  parameter  for  the  micro 
test  will  have  to  be  replaced  with  the  actual  number  of 
months  for  which  data  is  available  and  a  new  "acceptance 
number"  has  to  be  determined  from  an  examination  of  the 
tables  of  the  binomial  distribution.   We  recommend  that  the 
micro  test  not  be  used  if  the  number  of  months  is  less  than 
6  since  we  believe  that  the  test  will  not  be  very  sensitive 
in  this  case. 


III. 3.  Interlaboratory  Comparison 

As  indicated  in  the  introduction  the  laboratory  certi- 
fication scheme  is  to  include  a  comparison  of  the  performance 
of  a  laboratory  that  is  to  certified  with  that  of  another 
laboratory  that  has  previously  received  certification.   We 
believe  that  it  is  preferable  to  use  a  single  laboratory  such 
as  the  Pensacola  laboratory  as  a  standard  against  which  all 
others  are  compared.   The  advantage  of  doing  so  is  that  the 
performance  of  the  standard  laboratory  can  be  monitored  on  a 
regular  basis  to  maintain  a  high  performance  level;  besides, 
comparing  all  laboratories  against  a  single  standard  laboratory 
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is  a  more  equitable  procedure.   The  comparison  procedure  will 

use  data  already  available  in  the  monthly  correlation  reports. 

At  the  time  of  certification,  the  results  of  the  spectometric 

analyses  of  the  standard  samples  of  the  preceding  twelve 

months  are  extracted  both  for  the  laboratory  in  question  as 

well  as  the  Pensacola  laboratory  (the  laboratories  also  analyze 

used  oil  samples  under  the  correlation  program  but  these  are 

not  of  interest  here).  Let   X, ,  X_ ,  . . .  ,  X, „   be  the  spectrometer 

readings  for  the  Pensacola  laboratory  and   Y,  ,  Y2 ,  ...  ,  Y,- 

the  corresponding  readings  for  the  laboratory  to  be  certified. 

We  will  assume  that 

(i)   X,,  X,,,  ...  ,  X,2   are  independent  and  are  normally  dis- 
tributed with  means   y,f  \i    ,     .,.  ,  y,2   and  variances 

2    2  2 

°1'  °2'     '"     '    °12? 

(ii)   Y,  ,  Y2 ,  ...  ,  Y,  2   are  independent,  and  have  normal  dis- 
tributions with  means   v,  ,    v,,,  ...  ,  v,  2   and  variances 

2    2  2 

1 '   2 '  •  •  •  '   12' 

(iii)   from  past  records  (not  including  the  twelve  months  data 

used  for  the  comparison)  for  the  Pensacola  laboratory 

2    2  2 

estimates   S, ,  S2,  ...  ,  S,2   for  the  variances 

2    2  2 

a,,  a2,  ...  ,  a, ^   can  be  computed  from  samples  of  size 

n  =  10  each. 
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2 

The  reasons  for  letting  the   ji's,   v's   and   a  's   be  different 

for  different  months  is  to  allow  for  the  possibility  that  the 
standard  samples  have  different  initial  concentration  levels 

and  consequently  non-identical  means  and  variances .   It  is  to 

2 

be  noted  that  the  number  of  distinct   y '  s ,   v's    and   a  's 

is  equal  to  the  number  of  distinct  concentration  levels  in  the 
correlation  samples. 

The  implication  of  the  assumption  that  both  the   X's 
and  the   Y's  have  the  same  variance  within  each  month  is  that 
the  emphasis  in  the  interlaboratory  comparison  is  on  the 
accuracy  and  not  so  much  on  repeatability  provided,  of  course, 
the  repeatability  indices  are  not  too  far  apart. 

With  the  above  assumptions,  the  quantities 


(X.  -  Y.  )  -  (\i.    -  v.  ) 
1 i __i i_ 

1  /Is. 

l 


are  independent  and  each   t.   has  a  student's  t-distribution 
with  n-1  =  9  degrees  of  freedom.   If  the  performance  of  the 
laboratory  to  be  certified  is  the  same  as  that  for  the 
Pensacola  laboratory,  y.   will  be  equal  to   v..   In  this  case, 
it  can  be  shown  that   P[|x.  -  Y. I  >  2S.]  =  .20   approximately. 
In  other  words,  if  the  means  for  the  two  laboratories  are  equal, 
the  observed  readings   X.,  Y.   will  differ  by  at  least  two  stand, 
deviations  about  20%  of  the  time.   Now,  consider  the  twelve 


46 


absolute  differences   |x.  -  Y .  I  ,  i  =  1,2,...,  12.   The  number  of 
times  these  differences  will  exceed  twice  the  corresponding  standard 
deviation   S.   is  a  binomial  random  variable  with  parameters 
m  =  12   and   p  =  .20.   From  the  binomial  tables,  it  is  observed 
that  the  number  of  differences  that  exceed  twice  the  standard 
deviation  will  be  less  than  or  equal  to  five  with  probability 
.98;  equivalently,  the  chance  of  observing  six  or  more  pairs 
that  differ  by  more  than  two  standard  deviations  is  .02.   This 
then  provides  a  comparison  test  as  summarized  below: 

Step  1 :   From  past  records  for  the  Pensacola  laboratory  compute 

2    2  2 

the  sample  variances   S-,,  S~,  ...  ,  S,-   using  a  sample 

of  size  10  for  each  computation.   The  number  of  different 

2 

S.   to  be  computed  is  equal  to  the  number  of  distinct 

concentration  levels  used  in  the  correlation  samples. 
If  all  correlation  samples  have  the  same  concentration 

level  only  one   S   needs  to  be  computed.   From  a 
practical  point  of  view,  the  trimmed  sample  variances 
already  available  in  the  correlation  reports  may  serve 
the  purpose  and  may  result  in  the  saving  of  some  labor. 
We  believe  that  this  change  will  not  severely  affect 
tne  validity  of  the  statistical  procedure. 


Step  2 .   Compute   |X.  -  Yi I ,  i  =  1,2,...,  12. 
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Step  3 :   Let   K  =  number  of  differences 


X.  -  Y. 
l     l 


that  exceed 


2S.  . 

l 


Step  4 ;   If   K  <_   5   conclude  that  the  laboratory  under  examinatior 
meets  certification  criteria. 

Example:   The  data  used  in  this  example  is  fictitious  although 

some  of  the  numbers  are  sample  statistics  computed  from  the  validc 

tion  data  for  standard  samples.   Let   X..  Y.,  S.   and  the  initial 

r  ill 


u  .  (the  concentration  level  in  the  standard 
M0i 


concentration  levels 

sample  for  i   month)  be  as  in  the  table  below. 


Month 


l0i 


X. 

l 


X.  -  Y. 

l    l 


2S 


Accept 

or 
Reject 


1 

3 

2.88 

2.86 

.24 

0.02 

0.48 

Accept 

2 

3 

2.9  8 

2.80 



0.18 

0.48 

Accept 

3 

10 

10.02 

9.70 

.44 

0  .32 

0.88 

Accept 

4 

10 

9.71 

9.32 



0.39 

0  .88 

Accept 

5 

30 

29.84 

29.01 

1.45 

0.83 

2.90 

Accept 

6 

30 

29.73 

28.48 



0.25 

2.90 

Accept 

7 

50 

50.45 

50.14 

1.93 

0.31 

3.86 

Accept 

8 

50 

50.79 

49.89 



0.90 

3.86 

Accept 

9 

100 

102.0 

102.1 

4.52 

0.10 

9.04 

Accept 

10 

100 

101.3 

105.4 



4.10 

9.04 

Accept 

11 

100 

102.1 

100.1 



2.00 

9.04 

Accept 

12 

100 

102.0 

98.2 



3.80 

9.04 

Accept 

There  are  just  five  distinct  concentration  levels  and  hence 
only  five  different   S . . 
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There  are  zero  rejections,  so  we  conclude  that  the  laboratory 
passes  the  comparison  test. 

The  comparison  test   described  above  is  applicable  to 
most  of  the  spectrometric  laboratories  participating  in  the 
Joint  Oil  Analysis  Program.   The  requirement  is  that  a  laboratory 
is  to  have  participated  and  analyzed  standard  samples  under 
the  correlation  program  for  at  least  twelve  months  prior  to 
the  time  the  laboratory  is  due  for  certification.   As  indicated 
earlier  the  advantage  is  that  no  new  data  need  be  collected 
and  the  monthly  correlation  reports  provide  all  the  necessary 
information.   Some  of  the  newer  laboratories,  such  as  the 
Fort  Riley  laboratory,  will  not  meet  the  requirement.   We 
recommend  that,  in  these  cases,  the  following  modified  approach 
be  adopted.   JOAP-TSC  will  prepare  twelve  pairs  of  standard 
samples  with  a  mixture  of  concentration  levels;  we  suggest 
that  the  twelve  pairs  be  comprised  of  two  pairs  each  at 
3,  10,  30  and  50  ppm  and  four  pairs  at  100  ppm  concentration 
level .   For  each  pair  one  sample  will  be  analyzed  at  Pensacola 
and  the  other  by  the  laboratory  to  be  certified.   The 
statistical  analysis  will  be  on  the  same  lines  as  before, 
i.e.  as  given  in  Steps  1  to  4  above. 
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III. 4.  Evaluation  Testing 

The  final  subtask  is  the  design  of  a  test  to  be 
administered  to  the  evaluators  that  are  assigned  to  the 
spec trome trie  laboratories.   The  JOAP  Laboratory  Manual  dated 
1  May  1977  provides  decision  making  guidance  tables  to  aid  the 
evaluator  in  his  decision  making  process.   Separate  tables  are 
provided  for  each  type  of  equipment  and  contain  numerical 
criteria  relating  the  oil  sample  wearmetal  concentration  to  the 
expected  health  of  a  component  of  the  equipment.   The  recommended 
decisions  are  based  on  comparisons  of  the  results  of  a  used 
oil  sample  with  that  of  a  previous  sample.   The  types  of 
decisions  an  evaluator  can  make  are   (i)  not  to  take  any  action; 
(ii)  call  for  a  more  frequent  sampling  schedule;   (iii)  call 
for  an  immediate  additional  sample;  (iv)  recommend  a  maintenance 
action.   The  losses  resulting  from  incorrect  decisions  by 
the  evaluator  can  be  quite  high.   A  JOAP  failure,  i.e.,  an 
equipment  that  is  being  monitored  by  JOAP  fails  prior  to  detectio 
by  JOAP  can  result  in  a  loss  of  the  equipment.   Similarly,  a  JOAP 
miss,  i.e.,  a  JOAP  recommended  maintenance  action  which  finds 
no  discrepancies  can  be  expensive.   It  is,  therefore,  very 
important  that  an  evaluator  be  quite  conversant  with  the  basic 
facts  about  wearmetal  concentrations  and  also  have  sufficient 
experience  with  analyzing  sample  results  to  look  for  trends  and 
shortrun  features  such  as  a  sudden  rise  in  concentration  levels 
right  after  overhaul .   We  suggest  that  the  examination  be  in 
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two  parts.   The  first  part  consists  mostly  of  multiple  choice 
questions  which  will  test  the  basic  knowledge  about  wearmetal 
concentrations  that  is  critical  for  the  various  types  of 
equipment  being  monitored.   The  second  part  will  present  actual 
historical  data  to  illustrate  the  kinds  of  trends  and  the 
ambiguities  that  an  evaluator  will  encounter.   The  test  will 
examine  his  performance  as  gauged  by  the  number  of  correct 
decisions  made. 

A  set  of  sample  questions  testing  basic  knowledge  are 
presented  below. 

(1)  Spectrometric  analysis  will  not  detect 

a)  worn,  misaligned  or  scored  gears 

b)  broken  piston  rings  and  bands 

c)  failures  due  to  fluid  starvation 

d)  loose  or  defective  valve  guides 

e)  chips  or  wearmetal  particles  visible  to  the  eye 

(2)  Explain  in  two  or  three  sentences  the  effect  of  each  of 
the  following  on  the  integrity  of  spectrometric  analyses 

a)  contamination 

b)  electrodes 

c)  calibration  standards 

d)  electrolytic  corrosion 

e)  new  or  recently  overhauled  components 
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(3)  Briefly  describe  the  six  steps  to  be  followed  in  evaluating 
the  sample  results  of  incoming  oil  samples. 

(4)  If  for  aircraft  types  T-lA,  T-33A,  T-33B  or  QT33A,  a  sudden 
increase  in  Fe  and  Mg  is  observed  the  recommended  action 

is  to  inspect 

a)  accessory  drive  assembly  oil  pump 

b)  main  starter  housing  assembly 

c)  main  bearing  seals 

(5)  For  F-101/F-102  aircraft  the  most  significant  and  critical 
wearmetal  is 

a)  Fe 

b)  Mg 

c)  Cu 

d)  Ag 

e)  Cr 

(6)  For  F-84,  B-57  aircraft  the  most  significant  wearmetal  is 

a)  Fe 

b)  Mg 

c)  Cu 

d)  Ag 

e)  Cr 
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The  above  questions  are  based  on  information  contained  in  the 
JOAP  Laboratory  Manual.   For  questions  (4),  (5)  and  (6)  appro- 
priate cutaways  of  the  equipment  may  be  provided.   The  equipment 
types  selected  to  base  the  questions  for  the  Navy  evaluators 
should  be  Navy  aircraft  and  helicopters;  similarly  for  the 
other  services. 

We  recommend  that  in  the  second  part  of  the  examination 
case  histories  illustrating  the  following  situations  be  presented 

a)  Slow  and  steady  increase  in  wearmental  concentration  but 
there  is  no  potential  failure 

b)  slow  and  steady  increase  in  concentration  level  but  the  level 
has  passed  a  critical  stage 

c)  sample  results  after  a  recent  overhaul  showing  a  sudden 
increase  in  a  wearmetal  concentration 

d)  a  JOAP  failure 

e)  a  JOAP  hit 

f)  one  or  more  ambiguous  or  marginal  situations  where  either  a 
maintenance  action  or  no  action  would  be  considered  reasonable 

g)  a  case  where  there  is  a  build  up  in  Fe  concentration  due 
to  corrosion. 
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IV.   GRAPHITE  ELECTRODES 

IV. 1.    Introduction 

The  accuracy  of  readings  produced  by  a  batch  of 
electrodes  is  of  primary  importance  in  judging  the  accept- 
ability of  the  batch  for  use  in  the  oil  analysis  program. 
The  repeatability  characteristics  of  the  electrodes  are  also 
of  some  importance  in  judging  acceptability.   If  a  batch  of 
electrodes  scores  badly  on  repeatability  one  can  expect  a 
number  of  spurious  readings,  including  ones  which  may  be  too 
low  (possibly  missing  a  significant  increase  in  some  contam- 
inant in  a  used  oil  sample)  and  ones  which  may  be  too  high 
(possibly  indicating  a  high  contaminant  reading  when  the 
level  has  not  changed) .   Thus  it  is  suggested  that  both 
repeatability  and  accuracy  be  considered  in  judging  the 
acceptability  of  a  new  batch  of  electrodes. 

The  judgments  of  whether  the  new  batch  of  electrodes 
is  acceptable  with  respect  to  accuracy  and  repeatability  can  bes 
be  made  by  comparison  with  readings  gotten,  on  the  same  pre- 
pared oil  sample,  by  using  electrodes  from  a  previously 
accepted  batch.   It  is  suggested  that  the  elements  of  interest 
be  considered  one  after  another.   For  convenience  it  is  assumed 
that  a  10  ppm  primary  reference  standard  is  used.   A  different 
oil  standard  could  be  used  if  desired. 
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IV .  2  .   Acceptance  Criteria  for  Graphite  Electrodes 

The  suggested  procedure  calls  for  analyzing  the  spectrometer 

readouts  one  element  at  a  time,  to  ensure  that  the  electrodes  are 

uncontaminated  by  any  element  of  interest.   To  distinguish  between 

the  readings  gotten  with  the  new  batch  of  electrodes  versus  those 

from  the  previously  accepted  batch  we  shall  use  a  double  subscript, 

the  first  subscript  equalling  one  if  the  reading  is  made  with  an 

electrode  from  the  new  batch  and  this  first  subscript  equals  two 

if  the  reading  is  made  with  an  electrode  from  a  previously 

accepted  batch.   The  second  subscript  distinguishes  between  the 

several  readings  made  with  the  same  type  of  electrode.   We  shall 

assume   n,   samples  are  analyzed  with  the  new  electrodes  and   n2 

with  the  old.   (There  is  no  special  reason  that  we  would  have 

n,  ^   n_;  the  formulas  presented  allow  for  either   n,  =  n_   or 

n,  ^  n2- ) 

Thus  the  element  readings  from  the  new  batch  are 

X, ,  ,  X, „.  ...    ,    X,     and  from  the  previously  accepted  batch 
11'   12  In,  c 

they  are   X  ,,  X  2,  ...  ,  X2n  .   For  each  set  of  readings  we 
can  compute  the  sample  means : 

new  batch    X  =  -—  (Xn ,  +  X, .  +  • • •  +  Xn   ) 
1    n^    11     12  In, 

previously  accepted    X0  =  ±-    (xo,  +  xoo  +  • • •  +  x„   ) 

z        &2         ^1     22  2n_ 

and  the  repeatability  indices: 
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(x21-x2)2+(x22-x2)2+...  +  (x2n-x 

previously  accepted    s9  =  \l   - 2 

n2-l 


The  comparison  of  the  two  sets  of  readings  is  done  in  2  steps. 
First  we  shall  test  the  hypothesis  that  the  repeatability  index 
for  the  new  batch  does  not  exceed  the  index  for  the  old. 
Granted  this  is  accepted,  we  then  will  test  the  hypothesis  that 
the  mean  reading  for  the  new  batch  does  not  exceed  the  mean 
reading  for  the  old. 

To  test  that  the  new  repeatability  index  does  not 

2   2 
exceed  the  old  we  compute   s../s    and  compare  this  ratio 

with  a  value  from  an   F   table  with   n, -1   and   no~l 
degrees  of  freedom.   Which  entry  to  use  is  determined  by 
the  value  desired  for  the  probability  of  rejecting  the  new 
batch  because  of  bad  repeatability,  when  in  fact  it  has  an 
acceptable  repeatability  index.   Suppose  we  set  this  prob- 
ability at   .01   and  denote  the  tabular  entry  by   F  OQ .   We 

.  y  y 

then  conclude  the  new  batch  is  acceptable  with  respect  to 

2   2 

repeatability  if    st/s2  —   F  99 ;  otnerwise  we  conclude  it 

is  not.  ' 
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2   2 

Granted  that  we  find   si/s2  1  F  99 '  we  then  proceed 

to  test  the  equality  of  mean  readings.   We  first  compute  the 
combined  repeatability  index  (pooled  standard  deviation)  by 


2  2 

(n1-l)s1  +  (n2~l)s2 

S    =    \  I   ; t 

P    V      n,  +  n   -  2 


We  then  compute  the  test  statistic 


Xl  "  X2 


PVni    nn 


which  is  compared  with  an  entry  from  the  t-distribution  table. 
Again  the  entry  to  use  is  determined  by  the  probability 
desired  of  concluding  the  new  batch  is  not  acceptable  in 
accuracy,  when  in  fact,  it  is  acceptable.   Suppose  we  set 
this  probability  at  .01;  we  need  the  quantile   t  gg5 
from  the  t-distribution  with   (n,  +  n_  -  2) -degrees  of  freedom 
We  then  say  the  batch  is  acceptable  with  respect  to  accuracy 
if 


Xl  "  X2 


s^FT1 


<  t 
-   .995  ' 


otherwise  we  reject  the  batch  because  of  poor  accuracy 
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As  described,  this  test  is  "two-tailed"  and  the  new 
batch  of  electrodes  would  be  declared  unacceptable  if   X, -X^ 
gets  too  large  either  positively  or  negatively.   A  large  positive 
difference  may  be  rightly  attributed  to  possible  contamination 
of  the  new  batch  of  electrodes.   A  large  negative  difference, 
however,  would  seem  to  indicate  that  the  previously  accepted 
batch  of  electrodes  contains  a  higher  concentration  of  the 
element  being  analyzed  than  does  the  new  batch.   Logically 
one  would  not  want  to  reject  the  new  batch  in  this  case.   If 
this  case  occurs  for  one  or  more  elements  the  procedure 
followed  should  be  closely  examined  and  the  possibility  of 
contamination  of  the  old  batch  should  be  investigated. 

This  procedure  is  illustrated  numerically  below, 
assuming   n,  =  n_  =  15   samples  analyzed  with  both  the  new 
and  old  electrodes.   Although  they  are  not  written  in  that  order, 
it  is  assumed  that  the  analyses  with  the  old  and  new  electrodes 
are  done  alternately,  to  protect  against  a  possible  drift  of 
the  spectrometer  during  the  period  of  analysis.   The  sample 
sizes  of   n,  =  n_  =  15   are  used  for  illustration  only.   In 
acceptance  testing  of  large  batches  of  material  MIL  STD  105D 
should  be  consulted  regarding  appropriate  sample  sizes.   The 
assumed  readings  (for  10  ppm  standard)  are 
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Xlj  X2j 

9.5  9.5 
10.1  8.8 

9.8  9.1 

9.4  8.9 

9.6  9.2 

9.6  9.3 

9.5  9.5 

10.1  9.4 

9.7  9.4 
9.7  10.1 

10.0  9.3 

10.2  9.0 
10.0  9.8 
10.0  9.5 

9.7  9.4 

We  find   X,  =  9.79,  X2  =  9.35,  s,  =  .255,  s2  =  .333,  and  thus 


2 

sl 

4=  .58  ; 

S2 

Since   F  gq   is  about  3.5,  with   n, -1  =  14   and   n?~l  =  I4 
degrees  of  freedom,  we  would  accept  the  new  batch  for 
repeatability.   We  then  compute 
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and 


s      =   ,/l4(.255)2^    14(.333)2    .      ^ 

p      V  28 


Xl    I   *2'         =     '9'79    -    9-3^  =    4.06    . 


s      A/T7I  .297    a/^ 

p    A/n,         n0  V15 


We  find,  with   n.  +  n„  -  2  =  28  degrees  of  freedom,  t  ggg  =  2.763, 
and,  since 

4.06  >  2.763 

we  would  reject  the  new  batch  in  terms  of  accuracy.   From  these 
sample  results  it  would  appear  that  the  new  batch,  on  the  average, 
gives  a  reading  .44  ppm  higher  than  that  obtained  with  electrodes 
of  the  old  batch,  for  this  element.   It  may  well  be  that  such  a 
difference  is  not  practically  significant,  especially  when  one 
considers  the  acceptable  equipment  accuracy  and  repeatability 
indices  in  Tables  4-14  and  4-15,  pages  4-55,  4-56  of  T.O.  33A6-7-24 
These  tables  give  the  acceptable  accuracy  index,  for  10  ppm  iron 
concentration,  to  be  2.21  ppm  and  the  acceptable  repeatability 
index  (based  on   n  =  10   analyses)  to  be  .94  ppm.   Since  the 
accuracy  index  is  the  absolute  value  of  the  difference  between 
a  sample  average  reading  and  the  assumed  true  concentration  in 
the  oil,  this  would  imply  an  acceptable  difference  in  two  sample 
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averages  of   2(2.21)  =  4.42  ppm.   The  acceptable  pooled  standard 
deviation  for  two  samples  of  size  10  then  would  be 


s   _,/9(-94)2  +  9(.94)2         94 
P   V         i8 

and  the  implied  acceptable  value  of  the  t-statistic  would  be 


4.42 

=  10.51 


.94  /2/10 

With  18  degrees  of  freedom,  a  random  variable  with  tne 

_7 
t-distribution  will  exceed  8.115  with  probability  10   .   The 

implied  acceptable   t  value  of  10.51  above  would  occur  with 

-7 
probability  considerably  less  than  10   .   This  means  that, 

if  the  two  electrode  batches  are  uncontaminated,  there  is 

less  than  1  chance  in  10  million  of  the  t-statistic  being 

this  large.   Therefore  the  tabled  values  mentioned  do  not  seem 

to  provide  reasonable  values  for  deciding  the  acceptability 

of  electrode  batches.   Even  if  one  allows  the  difference  in 

mean  readings  of  two  samples  of  size  10  to  be  only  2.21,  the 

table  accuracy  index  value,  this  still  implies  an  acceptable 

t-value  of 

2  21 

=  5.26 


94  /27Tb" 
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which  has  probability  of  about   .00005   of  occurring  if  both 
batches  are  uncontaminated .   If  the  new  batch  is  contaminated, 
and  the  old  is  not,  this  magnitude  for  the  t-statistic  is 
much  more  likely  to  be  observed.   Thus  values  of   t   this 
extreme  should  not  be  called  acceptable,  because  of  the  size 
of  the  associated  large  probability  of  accepting  a  contaminated 
batch. 

It  still  may  be  desirable  to  allow,  say   c  ppm  difference 
in  apparent  content  of  the  contaminant  before  rejecting  the 
new  batch.   This  may  be  accomplished  as  follows:   If   X,  >  5L 
accept  the  new  batch  unless 


X   -  X   -  c 

_ _ >  t 

| 995 

s  .,/JL  +   i 


PVn1    n2 


and  if   X,  <  X_   accept  the  new  batch  unless 


X,  -  X   +  c 

_ _ <  _t 

,,/nr    -995 

p\/n,    n0 


With  the  above  values   X,  =  9.79,  X-  =  9.35,  s   =  .297, 

1  2  P 

n,  =  n_  =  15   and  with   c  =  1 ,  we  have  X,  >  X_   and  thus 
we  compute 
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9.79  -  9.35  -  1     _  .  _ 
— — ■ =  -5.16 


.297  /2/15 

which  is  smaller  than  2.763  so  the  new  batch  would  be  accepted. 

There  are  two  possible  errors  which  could  be  made  in 
considering  a  new  batch  of  electrodes:   A  contaminated  batch 
may  be  accepted  (called  Type  II  error)  or  a  good  batch  may 
be  rejected  (called  Type  I  error) .   For  any  two  specific  sample 
sizes   n,   and   n    the  smaller  that  one  makes  the  probability 
of  type  I  error  the  larger  the  probability  of  the  type  II 
error,  and  vice  versa.   Because  of  this  one  may  not  want 
to  use  such  extremely  small  probabilities  of  type  I  error  as 
would  be  suggested  by  the  values  in  Tables  4-14  and  4-15  of 
T.O.  33A6-7-24-1  mentioned  earlier. 

Sample  sizes  of  at  least   n,  =  n2  =  30   and  the  probability  of 
rejecting  a  good  batch  set  at  .01,  for  both  the   F   and   t 
statistics  used,  should  provide  a  useful  acceptance  criteria. 
Mil  STD  105D  should  be  consulted  for  reasonable  sample  sizes 
in  acceptance  sampling  of  large  batches  of  material. 

This  2-stage  test,  or  its  adaptation,  should  be  carried 
out  in  turn  for  each  element  of  interest.   If  the  new  batch 
is  rejected  for  any  one  or  more  elements,  these  electrodes 
should  be  declared  unacceptable. 
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IV. 3.    Summary  of  Acceptance  Criteria 

Samples  of   n,   and   n_   electrodes  are  selected  from 
the  new  and  old  batches,  respectively.   Each  electrode  is 
used  only  once.   The  instrument  should  be  accurately 
calibrated  using  electrodes  of  the  old  batch,  with  an 
accurately  prepared  oil  standard.   The  burns  with  new  and 
old  electrodes  are  done  alternately:   new,  old,  new,  old,  etc 
For  each  element  of  interest  the  acceptance  procedure  is 

(1)  Compute  the  average  reading  for  the  new  batch 

x.  =f  (xn  +x   +  •••  +X   ) 
1    n,    11     12  In, 

(2)  Compute  the  average  reading  for  the  old  batch 

X2  =  i7J  <X21  +  X22  +  •••  +  X2n2> 

(3)  Compute  the  repeatability  index  for  the  new  batch 


(4)  Compute  the  repeatability  index  for  the  old  batch 
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2   2 

(5)  Compute   st/s2   and  compare  with  F    ,  n  -1,  n  -1   degrees 

of  freedom.   These  may  be  found  in  "Tables  of  Common 
Probability  Distributions,"  P.W.  Zehna,  D.R.  Barr, 
Naval  Postgraduate  School  Technical  Report  NPS  55ZeBn  0091A, 
pages  16-21  or  some  equivalent  source.   Use  column 

n  =  n, -1   (interpolate  if  necessary),  major  row  m  =  n„-l, 

2   2 

minor  row  label  .99.   If   s,/s~  >_  F  .,,  reject  the  new 

?   2 
batch  for  poor  repeatability.   If  s^/s_  <  F  „« ,  go  on  to  6 . 


(6)  Compute  the  combined  repeatability  index 


(7)  Compute 


Xl  "  X2 


P\/nl    n2 


(8)  Find   t  gq[-   from  column  .995,  row   n  =  n1  +  n2  -  2, 

page  23,  in  "Tables  of  Common  Probability  Distributions," 
P.W.  Zehna,  D.R.  Barr,  Naval  Postgraduate  School  Technical 
Report  NPS  55Ze  Bn  0091A,  or  some  equivalent  source. 
If   n   +  n2  -  2  >  30,  use   t  995  =  2.575. 
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(9)   If 


X   -X| 

<  t 

.995 


n 

1 

s     . 

+ 

p 

/ni 

n2 

the  readings  are  acceptable  for  this  element.   Go  on  to 
analyze  another  element,  starting  at  1. 


(10)   If 


X,  -  X  | 

>t.995 


P\ln1        n2 


the  performance  of  the  new  batch  of  electrodes  is 
unacceptable. 


IV . 4 .    A  Statistical  Test  to  Evaluate  Trace  Metal  Content 
of  Graphite  Electrodes  as  Determined  on  the 
A/E  35U-3  Spectrometer. 

Just  as  with  the  acceptance  criteria  described  above, 
the  evaluation  of  the  trace  metal  content  of  the  new  graphite 
electrodes  is  most  appropriate  measured  relative  to  readings 
gotten  with  electrodes  of  known  quality.   The  procedure  for 
accomplishing  this  is  described  below. 


66 


Assume   n    burns  of  the  selected  reference  standard 
(say,  10  ppm)  have  been  made  with  electrodes  from  the 
new  batch.   The  discussion  is  pertinent  for  each 

element  in  turn  and  we  let   X, ,  ,  X, „,....  X,     be 

11    12         In, 

the  spectrometer  readings  for  iron,  say,  and  let   X, 

and   s,   be  the  average  and  repeatability  index, 

respectively,  for  these   n,  .   Let   X-,,  X~~,...,X~ 

-*  1  21    22       2n„ 

be  the  spectrometer  readings  for  this  same  oil 
standard  using  electrodes  from  a  previously  accepted 
batch.   The  average  reading  using  the  previously 
accepted  electrodes  then  is   X„   and  their  repeatability 
index  (standard  deviation)  is   s_.   A  good  measure  of 
the  excess  iron  trace  metal  content  in  electrodes  of 
the  new  batch  versus  those  previously  accepted,  is 
given  by   X,  -  5L  .   It  is  easy  to  compute  an  interval 
with  the  property  that  we  know  how  likely  it  is  that 
the  true  average  excess  of  the  iron  reading  (new  batch 
versus  old)  is  included  in  the  interval.   This  again 
requires  values  from  the  t-distribution  and  requires 
the  pooled  standard  deviation  (repeatability  index) : 

Sp    V      n   +  n_  -  2 


If  we  want  an  interval  which  we  are  100y%  sure  includes 

the  true  excess,  we  need   t*  =  t    from  the  t-distribution 

Y 
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with   v=n,  +  n?  -  2   (row   v,  column   y) .   Then  we  can 
be  100y%  sure  the  true  average  excess  does  not  exceed 


X,  -  X0  +  t*s   W  —  +  — 
1     2       p  y  n1    n2 


This  is  illustrated  below. 

Let  us  use  the  same  data  that  was  used  in  the 
acceptance  criteria  discussion  above.   Thus  we  have 


n,  =  n.  =  15,    X.  =  9.79,    X.  =  9.35,    s   =  .297 
12  1  2  P 


and  we  found   t  ggc  =  2.763   with  28  degrees  of  freedom 
Then  we  can  be  100y%  =  99.5%  sure  the  excess  iron 
contaminant  in  the  new  batch,  relative  to  the  old,  is 
no  larger  than 


9.79  -  9.35  +  2.763(.297)  \f^  =    .74  ppm, 


IV . 5 .    Variance  Contributed  by  Electrode 

To  identify  the  variance  contributed  by  the  new  batch 
of  electrodes  again  let  us  discuss  estimation  on  an  element 
by  element  basis.   We  shall  explicitly  discuss  the  procedure 
and  formulas  for  iron,  say,  with  the  understanding  that  the 
same  procedure  and  formulas  can  be  applied  in  turn  for  copper, 
aluminum,  magnesium,  etc. 
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We  assume   n,   electrodes  have  been  selected  from  the 
new  batch,  each  to  be  used  in  analyzing  a  sample  of  the  same 
oil,  say  a  prepared  standard  containing  10  ppm  of  iron.   Let 
X, ,  ,  X-,-,...,  X,     be  the   n    iron  readings  produced  by  the 
spectrometer  using  these  electrodes  from  the  new  batch.   We 
also  assume  we  have   n    electrodes  from  a  previously  accepted 
batch,  each  used  to  analyze  a  sample  from  the  same  oil  standard 

Denote  these  iron  readings  by   X21'  X22'**''  X2   *   T^e  tota-'- 

variance ,  then  of  these   n,  +  n„   iron  readings  is  a  constant 

times  the  sum  of  the  squares  of  each  individual  reading  less 
the  overall  mean: 

2   n . 


i=l  i=l    il 


where  the  overall  mean  is 

2  n± 

X  =  n    I    n       I  I        X'  •   ' 

nl  +  n2   i=l  j=l   ^ 

This  total  sum  of  squares  can  be  partitioned  into  two  parts 


2   n .             2  n .  _      2 

J-               -    o  1  ~>    nTno  X,-X„ 

I   I    (X  -x)2  =  I  I    (x..-x.)2  +  \2  +\   2 

i=l  j=l   1:        i=l  j=l  ^       1  nl  +  n2 


where 


1  "  RT  I      Xlj  '       X2  "      X2j 
1  3  3 
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are  the  average  readings  for  the  two  electrodes .   The  first 

of  these 

2   ni  nl  n2 

I       I      (x   -x)2=  I      (X   -x)2+  I      (X   -X)2 

i=l  j=l  ^   x    j=l   1D  L         j=l  ZJ      z 

is  just  the  sum  of  squares  of  the  readings  for  each  electrode 
about  its  own  average  value:   part  of  the  variability  of  the 
n,  +  n?   readings  is  given  by  the  variability  within  read- 
ings by  the  same  electrode  type.   The  remaining  term 


nln2    -     -   2 

(X,  -  x0r 


n,  +  n   v  i     2 

is  a  constant  times  the  square  of  the  difference  between 
the  two  averages:   the  remainder  of  the  variability  in  the 
n-.  +  n    readings  is  related  to  the  difference  in  average 
readings  of  the  two  electrode  types.   This  partition  of  the 
total  sum  of  squares  is  frequently  called  an  analysis  of 
variance;  it  breaks  the  total  variance  into  parts  which  can 
then  be  compared.   The  discussion  of  acceptance  criteria 
in  paragraph  c  above  actually  is  using  this  same  partition 
although  it  is  not  described  in  that  way. 

Isolation  of  the  variance  due  to  the  electrode  type  may 
be  done  in  a  relative  sense  as  follows:   Let  us  assume  that 
the  variance  of  a  reading,  using  an  electrode  from  the  new 

batch  is 

2     2 
V[X1 .]  =  a   +  a1 
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2 

where   a    is  the  variance  due  to  the  instrument,  oil  standard 

2 
used,  etc.,  and  a^      is  the  contribution  from  the  new  electrode 

batch.   Similarly,  assume  the  variance  of  a  reading  from  the 

previously  accepted  batch  is 

V[X2j]  =  o2  +  o22 

2 

where   a    is  the  same  as  before,  since  the  same  instrument, 

2 
oil  standard,  etc.,  are  used  with  these  readings,  and   a„   is 

the  contribution  from  the  old  electrode  batch.   It  can  be 

shown  that 

sl  -  n^Tz<xij  -  V2 

2     2 
is  an  unbiased  estimate  of   V[X]  .],  i.e.,  of   a   +  a,   and  that 

2      1  2 

S2  =  n2-l  Z(X2j  "  X2} 

2     2 
is  an  unbiased  estimate  of   V[X».]  =  a   +  a~ .   The  difference, 

2     2  2     2 

s,  -  Sy,    then  gives  an  unbiased  estimate  of   a,  -  a-,  the 

differences  in  variance  contributed  by  the  two  types  of 
electrodes,  since  the  term  contributed  by  the  instrument  and 

standard  cancels  off  in  forming  the  difference.   If,  for 

2         2  2     2. 

example,  we  found   s,  =  .8,  s?  =  .7  then   s,  -  s2  =  .1   is 

the  estimated  excess  variance  for  the  new  electrode  batch 
versus  the  previously  accepted  batch.   Note  that  this  measure 
is  a  function  of  the  repeatability  indices  only  and  is  un- 
affected by  the  accuracy  indices  of  the  two  batches. 
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